Skip to content

API: Remove integer position args from xy for plotting #20371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

API: Remove integer position args from xy for plotting #20371

wants to merge 2 commits into from

Conversation

masongallo
Copy link
Contributor

This is based on discussion from #20000 where we decided it was confusing to allow df.plot to support both labels AND positions. I wasn't sure where to put this in whatsnew, so lmk and I'll update it.

# n.b. there appears to be no public method to get the colorbar
# label
assert ax.collections[0].colorbar._label == 'z'
if self.mpl_ge_1_3_1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger we should be able to blow this code away as we > 1.4.3, yes (separate PR)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a sub-section in the whatsnew about this in api-breaking changes. Note we may want to consider deprecating this and removing in a future version @jorisvandenbossche @TomAugspurger . though I am fine with just changing it.

df = DataFrame({"A": [1, 2], 'B': [3, 4], 'C': [5, 6]})
with pytest.raises(ValueError):
df.plot(x=x, y=y)

@pytest.mark.parametrize("x,y,colnames", [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test that checks the raise when passing ints and a non-int index

@jorisvandenbossche
Copy link
Member

Note we may want to consider deprecating this and removing in a future version @jorisvandenbossche @TomAugspurger . though I am fine with just changing it.

If it worked, I think we should deprecate it. But did it actually work? Because #20056 seems to indicate not? (if it didn't work, of course fine to just remove)

@jorisvandenbossche
Copy link
Member

So it seems that for line, bar, area it doesn't work, but for scatter, pie and hexbin it does?

@jreback jreback added Visualization plotting Deprecate Functionality to remove in pandas labels Mar 16, 2018
@TomAugspurger
Copy link
Contributor

I don't think we should remove this. I know positional plotting gets some use, especially for exploratory analysis when you have long column names.

I'll look more at why the original issue was failiing.

@masongallo
Copy link
Contributor Author

So it seems that for line, bar, area it doesn't work, but for scatter, pie and hexbin it does?

@jorisvandenbossche yes I believe this is due to the structure of the code - the index setting I mentioned in #20056 only happens to kind not from that set - called _dataframe_kinds and _series_kinds.

I don't think we should remove this. I know positional plotting gets some use, especially for exploratory analysis when you have long column names.

@TomAugspurger why not ask the user to supply data.columns[ind] instead of us making assumptions and calculating it for them? IMO that would make the API less complex.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 16, 2018 via email

@masongallo
Copy link
Contributor Author

IME, DataFrame.plot is most often used in exploratory analysis. For this,
speed typing becomes an API consideration.

That's fair - you have much more context than I do on usage.

I don't think we ever make assumptions?

We make an assumption by allowing mixed types in column names - i.e. we check with holds_integer and then assume a given int is meant as a column name.

Seems like we don't have consensus on whether this should be removed from API. How should we proceed? I can close this and fix the integer indexing bug in #20000 ?

@TomAugspurger
Copy link
Contributor

We make an assumption by allowing mixed types in column names - i.e. we check with holds_integer and then assume a given int is meant as a column name.

Perhaps it's just semantics then, but I wouldn't call that assuming :) To me the behavior is clear (but could be better documented). Specifying x and y as positions is allowed if and only if the columns holds no integers.

I can close this and fix the integer indexing bug in #20000 ?

If you're comfortable with maybe wasting some effort, I would work around the bug in #20000. We can wait to hear what the other maintainers say before closing or working further on this issue, but I would be sad to see this go.

@jorisvandenbossche
Copy link
Member

Perhaps it's just semantics then, but I wouldn't call that assuming :) To me the behavior is clear (but could be better documented). Specifying x and y as positions is allowed if and only if the columns holds no integers.

Although there might be some "guessing" cases (mixed integer columns), this is also behaviour that used a lot throughout the code base (eg level can often be specified both as integer or as name).

--

I don't have a strong feeling about this. If it is not too difficult to fix this to actually work, I am certainly fine with keeping it.

@masongallo
Copy link
Contributor Author

FYI: will be closing this in favor of #20000

@masongallo masongallo closed this Mar 22, 2018
@masongallo masongallo deleted the remove-positions-plotting branch March 22, 2018 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Deprecate Functionality to remove in pandas Visualization plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: df.plot fails when given x,y args as positions
4 participants